# RLHF Optimization

RM R1 DeepSeek Distilled Qwen 32B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning trajectories, providing interpretable evaluations.
Large Language Model Transformers English
R
gaotang
506
0
RM R1 Qwen2.5 Instruct 7B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, significantly improving accuracy and interpretability compared to traditional reward models.
Large Language Model Transformers English
R
gaotang
23
2
RM R1 Qwen2.5 Instruct 14B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable assessments.
Large Language Model Transformers English
R
gaotang
21
1
RM R1 Qwen2.5 Instruct 32B
MIT
RM-R1 is a framework for reward modeling through reasoning trajectory generation, offering significant improvements in accuracy and interpretability compared to traditional methods
Large Language Model Transformers English
R
gaotang
29
1
Llama 3 OffsetBias RM 8B
A reward model trained on the OffsetBias dataset, offering enhanced robustness against biases in evaluation models
Large Language Model Transformers English
L
NCSOFT
1,782
23
Llama 3 8B SFR SFT R
A supervised fine-tuned model based on LLaMA-3-8B, developed by Salesforce, for the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.
Large Language Model Transformers
L
Salesforce
22
8
JSL MedMNX 7B
A 7-billion parameter medical large language model developed by John Snow Labs, optimized for the biomedical field
Large Language Model Transformers English
J
johnsnowlabs
2,665
5
Ambersafe
Apache-2.0
AmberSafe is a safety fine-tuned instruction model based on LLM360/AmberChat, belonging to the LLM360 Pebble series, focusing on providing secure text generation capabilities.
Large Language Model Transformers English
A
LLM360
52
7
Xwin LM 13B V0.2
Xwin-LM is a large language model alignment technology developed based on Llama2, demonstrating outstanding performance in the AlpacaEval benchmark
Large Language Model Transformers
X
Xwin-LM
713
51
Xwin LM 7B V0.1
Xwin-LM is a large language model alignment solution based on Llama2, focusing on enhancing the model's alignment capabilities, including supervised fine-tuning and reward modeling techniques. The 7B version performs excellently in the AlpacaEval benchmark.
Large Language Model Transformers
X
Xwin-LM
755
77
Gpt2 Open Instruct V1 Anthropic Hh Rlhf
MIT
A dialogue model fine-tuned on the Anthropic/hh-rlhf dataset based on GPT2-open-instruct, excelling in responding to prompts in dialogue scenarios
Large Language Model Transformers English
G
jtatman
125
5
Reward Model Deberta V3 Large V2
MIT
This reward model is trained to predict which generated answer humans would prefer for a given question. Suitable for QA evaluation, RLHF reward scoring, and toxic answer detection.
Large Language Model Transformers English
R
OpenAssistant
11.15k
219
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase